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Abstract 

The Mark 5C disk-based VLBI data system is being developed as the third-generation Mark 5 disk- 
based system, increasing the sustained data-recording rate capability to 4 Gbps. It is built on the same 
basic platform as the Mark 5A, Mark 5B and Mark 5B+ systems and will use the same 8-disk modules 
as earlier Mark 5 systems, although two 8-disk modules will be necessary to support the 4 Gbps rate. 

Unlike its earlier brethren, which use proprietary data interfaces, the Mark 5C will accept data from a 
standard 10 Gigabit Ethernet connection and be compatible with the emerging VLBI Data Interchange 
Format (VDIF) standard. Data sources for the Mark 5C system will be based on new digital backends 
now being developed, specifically the RDBE in the U.S. and the dBBC in Europe, as well as others. 

The Mark 5C system is being planned for use with the VLBI2010 system and will also be used by 
NRAO as part of the VLBA sensitivity upgrade program; it will also be available to the global VLBI 
community from Conduant. Mark 5C system specification and development is supported by Haystack 
Observatory, NRAO, and Conduant Corporation. Prototype Mark 5C systems are expected in early 
2010 . 

1. Introduction 

The Mark 5C is being designed as the next-generation Mark 5 system, with a capability of 
recording sustained data rates to 4096 Mbps. It will use the same disk modules as the Mark 5A 
and Mark 5B, thus preserving existing investments in disk modules. The data interface for both 
recording and playback will be 10 Gigabit Ethernet, which is rapidly becoming a widely supported 
standard. The use of lOGigE interfaces comes with some significant implications, however. Firstly, 
data sources must be designed to provide data streams in a format compatible with the Mark 5C 
requirements. And secondly, data playback through a lOGigE interface is a good match for a rising 
generation of software correlators. In the interests of backwards compatibility, the Mark 5C will 
also support a mode which writes disk modules in Mark 5B data format which can be correlated 
on existing Mark IV correlators that support Mark 5B. 

The Mark 5C will be implemented using the existing Amazon StreamStor disk-interface card 
(used in the Mark 5B+) from Conduant Corporation along with a new lOGigE-specific interface 
daughterboard being designed by Conduant. Unlike the Mark 5 A and Mark 5B, no separate 
specialized I/O card will be necessary in the Mark 5C. 

2. Data Sources 

One major implication of the Mark 5C model is that the data source is responsible for all data 
time-tagging, formatting, and creation of Ethernet packets. This is a departure from the VSI-H 
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model used by the Mark 5B, which has basically only 32 parallel sample bit-streams, a clock, and 
lpps tick flowing between the data source and the Mark 5B, with the Mark 5B being responsible 
for creating data frames with higher level time-tagging and formatting. 

Fortunately, VLBI data sources capable of creating such formatted Ethernet packets are now 
being developed in both the U.S. and Europe as part of the development of digital downconverters 
and backends. Several different digital-backend systems with lOGigE output suitable for recording 
by the Mark 5C are planned to be available in 2010. The details of the data formats to be 
provided to the Mark 5C are specified in a separate document “Mark 5C Data-Frame Specification” . 
Normally, each Ethernet packet from the data source will contain sample data from only a single 
frequency channel, although a Mark 5B-compatible data mode is specified which will write disk 
modules in a format that can be re-played on a Mark 5B playback unit; this will provide the ability 
to process the recorded data on existing Mark IV hardware correlators. 

3. Correlation 

A major shift is currently developing to move from hardware-based correlators to software- 
based correlators, some of which already exist. Unlike the Mark 5A and Mark 5B, the Mark 5C 
will have no streaming hardware playback interface. Instead, the data files will appear to the user 
as standard Linux files and will be read as such. We expect that the standard interface for playback 
to a correlator will be through a standard lOGigE interface implemented on a commercial NIC. 
Unlike existing hardware correlators, software correlators do not demand constant-rate streaming 
inputs. As such, the Mark 5C playback is well-suited for interfacing to software correlators, but 
not well-suited for or intended to interface to hardware correlators. 

4. General Mark 5C Characteristics 

The Mark 5C will have the following characteristics: 

• Mark 5C will be fully compatible with all existing Mark 5 disk modules; however, some 
modules with older disks may limit record/playback data rates. 

• At data rates above about 2 Gbps, it will be necessary to record to two 8-disk modules 
simultaneously, in so-called “non-bank” mode, which is not normally used by Mark 5A or 
Mark 5B/5B+. 

• A lOGigE interface for receive-only will be implemented on the Amazon StreamStor disk- 
interface card (currently used in the Mark 5B+) by replacing the FPDP I/O daughterboard 
on the Amazon card with a newly designed lOGigE daughterboard. This lOGigE interface 
will be receive-only , optimized for sustained real-time recording of at least 4096 Mbps from 
a data source. Received Ethernet packets can be OSI Layer 2 or higher but will only be 
processed by the Mark 5C at the Layer 2 level. Jumbo Ethernet packets up to 9000 bytes 
will be supported. The data source is required only to transmit Ethernet packets and is not 
required to process any received packets. 

• The entire data payload from each arriving Ethernet packet, sans a specified length of pay- 
load header (which may contain higher OSI Layer parameters or other information), will be 
recorded to disk. In this sense, the Mark 5C is entirely ‘formatless’; i.e., all data formatting 
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must be done by the data source. This allows each user to format the recorded data according 
to his/her needs. 

• The Ethernet data payload may contain a user-generated 32-bit “Packet Sequence Number” 
(PSN), whose position within the data payload can be specified to the Mark 5C. The Mark 
5C can be commanded to a “PSN monitor mode” that will parse this serial number from 
every packet to identify missing or out-of-order packets. Out-of-order packets, within some 
reasonable limits, will be restored to proper order, while the user data from each missing 
packet will be replaced by user-specified “fill-pattern” data. The MSB of the PSN may also 
be used as an ‘invalid’ marker to prevent recording data from a packet. If “PSN monitor 
mode” is disabled, data are recorded to disk in the order that packets are received; no checks 
are made for out-of-order or missing packets. 

• Similar to the Mark 5A/B, the Mark 5C will record data as “scans”, where a scan is defined 
as the period between starting and ending the recording of a particular observation. The 
duration of a scan may be from several seconds to many minutes. The host application 
software will maintain a directory of scans for easy identification and access. No duplicate- 
named scans are allowed. 

• Scans will appear as normal Linux files to the host PC. Data playback on the Mark 5C will 
be through a lOGigE NIC interface on the host PC. A planned upgrade by Conduant of 
the Amazon card, which interfaces to the PCI-X bus, will support the PCI-e bus to allow 
substantially higher playback rates from Mark 5 disk modules. 

5. Physical Connections 

Because the interconnection between the data source and the Mark 5C is a standard 10 Gigabit 
Ethernet connection, there is considerable flexibility in connection topology between data-source 
units and Mark 5C units. Figure 1 illustrates the general case where multiple data-source units 
provide data to multiple Mark 5C units through a standard Ethernet switch. The Ethernet switch 
allows Ethernet packets from either of the two DBE2 units to be arbitrarily routed to any of four 
Mark 5C units, providing an easy way to manage route data packets in an arbitrary manner, 
but it is particularly useful to manage data-rate mismatches between individual data sources and 
individual Mark 5Cs. For example, an 8 Gbps packet stream from a single DBE2 could be separated 
into two 4 Gbps packet streams to two Mark 5C units. 

6. Data-organization Philosophy 

Previous generations of VLBI data recorders, such as Mark III / 3 A/4/5 A/5B have generally 
formatted multiple observing frequency channels as multiple-parallel bit-streams representing si- 
multaneous samples from all channels. Also, typically, the per-channel data rate is constrained 
to 2 n Mbps, and the number of channels is also usually constrained to be integer powers of 2, 
thereby also constraining total data rates to be 2 n Mbps. This type of data organization has been 
satisfactory for the type of hardware correlators that have been predominant over the past 30 
years, but it is not well matched to today’s needs. 

Modern VLBI data management relies increasingly heavily on the use of standardized local 
and global computer networks whose capacities are not well matched to the 2 n Mbps paradigm 
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Figure 1. Generic signal-connection diagram for Mark 5C. 


of established VLBI data systems. Furthermore, VLBI correlation processing is moving rapidly 
towards fully software solutions operating on clustered computer networks. Both of these trends 
argue for improved flexibility in aggregate VLBI data rates and repackaging of data into more 
convenient forms for processing by standard computer hardware. 

7. Mark 5C Data-frame Format 

The Mark 5C itself is data-format independent and simply records Ethernet packets which are 
sent to it. However, most usage of the Mark 5C is expected to utilize the VLBI Data Interchange 
Format (VDIF) specification ratified by the global VLBI community in 2009. 

The primary elements of the VDIF specification, as applied to the Mark 5C, are as follows: 

1. Each Ethernet packet arriving at the Mark 5C contains one self-identifying VDIF Data 
Frame. 

2. Each Data Frame contains a 16-byte or 32-byte Data Frame Header followed by a Data 
Array. 

3. The Data Array of a Data Frame may contain either single-channel or multi-channel data, 
but most Mark 5C usage is expected to utilize multiple “threads” of single-channel Data 
Frames. 

4. The Data Frame length must be constant during a given scan (defined as the period between 
starting and ending the recording of a particular observation) but can vary from scan to scan 
if necessary. 

5. The Data Frame length (including header) must meet the following three criteria: 

(a) Must be within the range 64-9000 bytes 

(b) Must be a multiple of 8 bytes for compatibility with the StreamStor disk-addressing 
algorithm 

(c) Must result in an integer number of complete Data Frames per second 

The currently proposed format of a VDIF data-frame header is shown in Figure 2. 

Some of the fields in the VDIF data-frame header are self-explanatory. The detailed VDIF 
specification is available at http://vlbi.org/docs/VDIF specification Release 1.0 ratifted.pdf. 
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WordO 
Word 1 

Word 2 
Word 3 
Word 4 
Word 5 
Word 6 
Word 7 


Bit 31 (MSB) Bit 0 (LSB) 

Byte 3 Byte 2 Byte 1 Byte 0 


II 

Li 

Seconds from reference cpoch 30 

Un- 
as signed 2 

Ref Epochs 

Data Frame # within second 2 4 

V3 log2(#chns)5 

Data Frame length (units of 8 bytes) 24 

Ci 

bits/sample- 1 5 Thread ID, 0 Station ID| 6 

EDVr 

Extended User Data 24 

Extended User Data 32 

Extended User Data 32 

Extended User Data 32 


Figure 2. VDIF Data Frame Header format; subscripts are field lengths in bits. 


8. Slow- disk Management 

Considerable experience has shown a low, but significant, occurrence of “slow” disks, that is 
disks that still function to read and write, but at a reduced data rate due presumably to marginal or 
bad sectors on the disk. Normal RAID systems will simply slow down to accommodate such disks 
in a RAID array since they are not designed for real-time operation. The StreamStor disk-interface 
cards in the Mark 5 units are designed to re-distribute the load to normal-performing disks in the 
face of one or more “slow” disks in the recording array so as not to lose critical real-time data. 
Playback through a software correlator is generally non-critical as to playback rate, though the 
correlation speed may be slower than normal. If correlation speed is important, a playback mode 
may be invoked whereby data not received in time from slow disks will be replaced with a specified 
“fill pattern”. 

9. Upgrade Path from Mark 5A, Mark 5B, or Mark 5B+ 

The Mark 5C requires an Amazon StreamStor card (~ $9500) plus a 10 Gigabit/sec Ethernet 
daughterboard (~ $3-4K); no VLBI-specific I/O board is required. To upgrade from Mark 5A or 
Mark 5B requires an Amazon board and a lOGigE daughterboard. To upgrade from a Mark 5B+ 
requires only a lOGigE daughterboard. 

10. Summary 

The Mark 5C development is being supported by Haystack Observatory, NRAO, and Conduant 
Corporation. Prototype Mark 5C systems are expected in early 2010 and should be available to 
the VLBI community in the latter half of 2010. 

Detailed specifications for the Mark 5C are available at http://www.haystack.edu/tech/vlbi/Mark 
5/Mark 5-memos in memo number 57; other information related to Mark 5C is also available at 
this URL. 
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